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Abstract 

An  important  part  of  voice  signal  processing  is  to  perform  a  nonlinear  operation 
along  frequency  on  the  short  time  spectrogram,  while  the  nonlinear  adaptation  along 
time  is  better  understood.  We  developed,  computed  and  analyzed  a  class  of  non¬ 
linear  nonlocal  cochlear  models  to  approximate  this  nonlinear  aspect.  The  model  is 
mechanical  in  nature,  and  outputs  the  acoustic  responses  on  the  basilar  membrane. 
In  case  of  two  or  three  tones,  our  results  are  in  qualitative  agreement  with  existing 
data.  We  prove  that  the  model  is  well-posed  in  Sobolev  spaces  for  all  time,  and  admits 
exact  multi-frequency  solutions  (quasi-periodic  in  time)  if  the  nonlinearity  is  cubic 
and  weak  enough.  We  upscale  the  model  output  towards  modeling  psychoacoustic 
responses,  to  help  direct  applications  in  signal  processing  based  on  first  principles. 
For  input  of  tone  plus  a  banded  noise,  we  calibrate  the  model  with  absolute  masking 
thresholds  (on  noise  only),  then  rely  on  model  nonlinearity  to  capture  tonal  masking 
of  noise  and  modified  thresholds  resulting  from  their  interactions. 


1  Statement  of  the  problem  studied 

Given  an  input  sound  signal,  we  studied  the  mechanical  responses  at  basilar  membrane 
(BM)  with  a  nonlinear  nonlocal  model  of  transmission  line  type: 

Pxx  ~  Nutt  =  es(x)ut,  x  €  (0,  L),  (1.1) 

P  —  mutt  +  r(x,  \u\)ut  +  s(x)u,  (1.2) 

where  p  is  the  fluid  pressure  difference  across  the  BM,  u  the  BM  displacement,  L  the  longi¬ 
tudinal  length  of  BM;  N  a  constant  depending  on  fluid  density  and  cochlear  channel  size; 
es(x)  is  a  selective  damping  term  operative  near  x~L;m,r,s  are  the  mass,  damping,  and 
stiffness  of  BM  per  unit  area,  with  m  a  constant,  s  a  continuously  differentiable  nonnegative 
function  of  x.  The  function(al)  r  of  x,  u  is  of  the  form: 

r(x,  M)  =  ra  +  7  £  P(\u(x ',  t)\)K{x  -  x')  dx'.  (1.3) 

Here  ra  is  the  local  part  of  BM  damping,  taken  as  positive  constant  for  simplicity.  In  the 
nonlocal  BM  damping:  K  =  K{x)  is  a  localized  Lipschitz  continuous  kernel  function  with 

total  integral  over  x  e  R1  equal  to  1;  P(-)  is  a  nonnegative  continuously  differentiable 
function  such  that 

P(°)  =  0,P(q)  <  C(l  +  q2),  C  >  0,  Vq>  0.  (1.4) 

The  common  choice  is  P(q)  =  q  (giving  overall  quadratic  nonlinearity)  or  P(q)  =  q2  (overall 
cubic  nonlinearity). 

The  physical  boundary  and  initial  conditions  are: 

Px{0,t)  =  TMPT(t)  =  f(t),  p(L,t)  =  0,  (1.5) 

u(xj  0)  =  Uo(x),ut(x,  0)  =Ui(x),  (i.6) 

where  pT(t)  is  the  input  sound  pressure  at  the  eardrum;  and  TM  is  a  bounded  linear 
operator  on  the  space  of  bounded  continuous  functions,  with  output  depending  on  the 
frequency  content  of  pT(t).  If  pT  =  Aj  exp {iujtj+c.c.,  a  multitone  input,  c.c  denoting 

complex  conjugate,  JM  a  positive  integer,  then  TMPT(t)  =  £/"  Bj  exp{iojjt}  -I-  c.c,  where 

Bj  =  aM^A^  c  c  for  complex  conjugate,  aM(*)  a  scaling  function  built  from  the  filtering 
characteristics  of  the  middle  ear  [8].  Established  data  are  available  in  [7], 

The  linear  part  of  the  model  is  well-known  [11]  except  for  the  selective  damping  term 
£s(x)uu  yet  it  is  the  nonlinear  nonlocal  part  that  plays  the  significant  role  in  generating 
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nontrivial  output  for  multi-tone  input,  where  there  is  no  linear  superposition.  The  model 
generalized  earlier  ones  in  [9],  [6],  [5],  [1]. 

The  major  problem  is  to  compute  and  analyze  ”long  time”  behavior  of  solutions,  and 
the  energy  distribution  over  frequencies.  For  example,  if  two  tones  at  frequencies  fu  f2  and 
intensities  Jx  and  I2  are  taken  as  boundary  input  at  s  =  0,  what  are  the  intensities  like  at  fx 
and  /2  after  transients  die  out  (typically  after  20  ms  (milleseeonds))  ?  Mathematically,  one 
asks  about  the  analytical  temporal/spatial  structures  of  solutions  at  long  times.  Similarly, 
we  study  the  effect  of  noise  on  tones,  and  investigate  how  noise  can  be  shadowed  by  tones 
at  a  later  time  (20  ms)  through  nonlinear  interactions* 


2  Summary  of  the  most  important  results 

(1)  In  [14],  we  used  a  semi-implicit  finite  difference  method  to  compute  quasi-steady-state 
responses  under  one  or  two  or  three  tone  input  at  x  =  0.  For  one  tone  input,  we  recover 
isodisplacement  curves  similar  to  those  in  [10],  showing  sensitivities  of  BM  responses  near 
the  input  tones.  For  two  and  three  input  tones,  we  vary  the  intensity  and  frequency  of 
one  tone,  and  computed  the  responses  of  other  tones.  In  particular,  increasing  intensity  of 
one  tone  reduced  the  intensities  of  other  tones,  in  agreement  with  experimental  findings  in 

[2] >  W,  [3]-  Moreover,  we  observed  propagation  of  dispersive  long  waves  passing  through 
quasi-steady-states  and  accumulating  near  x  =  L,  which  we  devised  the  term  es(rr)  ut  to 
damped  out  near  the  right  boundary  point. 

The  numerical  computation  of  three  tone  responses  is  hard  to  find  in  the  literature,  and 
our  contribution  [14]  appeared  to  be  the  first. 

(II)  In  [15],  we  proved  that: 


Theorem  2.1  (Global  Well-posedness)  The  model  cochlear  system  (U)-(1.5)  has  unique 
solutions: 


(u,p)  6  C([0,  oo);  ifx([0,  L])  x  ff3([0,L])). 

If  additionally: 

s(x)>sQ>0;  |M|2  <  (2.7) 

for  some  positive  constant  s0  >  0,  then: 

<  0(t1/2),  r1  jf*  |M|I(£')  dt!  <  0(1),  (2.8) 

for  all  t  >  0. 
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The  proof  is  based  on  writing  the  pressure  p  in  terms  of  displacement  u,  and  perform 
energy  estimates  on  («,«*),  ||  •  ||2  is  the  standard  I?  norm.  We  also  constructed  exact 
solutions  with  multi-frequencies  in  time: 

Theorem  2.2  (Multitone  Solutions)  Let  P(u)  =  u2,  the  overall  nonlinearity  is  cubic; 
and  let  the  input  boundary  condition  be: 

fit)  =  E  %  exp +  c.c, 

forme  Z  (any  positive  integer).  Fix  p  >  1.  Then  for  7  small  enough,  the  cochlear  system 
(1.1)- (1.5)  has  a  solution  of  the  form: 

u(x,t)  =  E  Uk(x)  exp{ik  •  ut},  (2.9) 

where: 

w.  =  (wi,w2,---,wto), 

Uk{x)  G  H2([0,  L))  such  that: 

E  pw  uk(x)  <  00, 

kezm 

uniformly  in  x  G  [0,  L],  The  pressure  p  is  similar. 

The  proof  depends  on  estimates  of  linear  operators  indexed  by  k  G  Zm,  which  are 
elliptic  with  complex  coefficients.  The  exact  solution  is  found  by  contraction  mapping 
theorem  in  the  Banach  space  whose  elements  have  the  expansions  as  in  (2.9).  We  see  that 
much  more  frequencies  are  generated  by  the  m  input  ones  wl5 •  •  -,um.  For  example,  the 
commonly  observed  2uq  -  w2  etc. 

(3)  In  [16],  we  relate  the  mechanical  output  on  BM  to  masking  thresholds  in  psychoa¬ 
coustics  (perception).  The  absolute  hearing  thresholds  for  banded  noise  is  known,  see  Fig 
1.  We  are  interested  in  producing  new  thresholds  when  tones  are  added.  For  example,  if 
we  insert  a  tone  at  2  kHz  and  80  dB,  then  ask  what  the  maximal  level  of  noise  is  in  order 
not  to  be  heard  (in  the  presence  of  the  tone).  The  new  threshold  curve  (masking  curve) 
is  shown  in  Fig  2,  plotting  masked  noise  intensity  vs.  noise  frequency.  The  idea  is:  (step 
1)  use  Fig.  1  to  determine  the  corresponding  BM  displacement  thresholds  for  noise  input. 
Next  (step  2)  when  both  tones  and  noises  are  sent  in,  we  extract  the  noisy  components  of 
BM  displacements,  and  calculate  the  masking  threshold  levels  with  the  BM  displacement 
thresholds  of  step  1.  The  overall  model  exploits  information  at  both  the  microscopic  (BM) 
level  and  the  macroscopic  (psychoacoustic)  level.  We  are  performing  extensive  validation 
tests  on  this  first  principle  based  multi-scale  PDE  approach  to  psychoacoustics. 
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Intensity  in  dB  of  Noise  at  Thresholds 


Noise  Masking  Thresholds  in  the  Presence  of  Tone  (2  kHz,  80  dB) 


„°  1000  20M  3000  4000  5000  6000  7000  8000 

Frequency  of  Masked  Noise  in  Hz 

Figure  2:  Masking  thresholds  on  banded  noise  (bandwidth  300  Hz)  in  the  presence  of  a 
tone  (2  kHz,  80  dB). 


The  masking  curve  in  Fig  2  has  formed  a  peak  centered  at  tone  frequency  2k  Hz,  which 
helped  to  compress  noise  whose  intensities  fall  under  the  curve.  Such  technique  can  be 
utilized  for  voice/speech  recognition  in  conjunction  with  time  adaptation.  A  related  but 
less  first  principle  based  procedure  (so  called  peak  focusing)  is  adopted  for  this  purpose  in 
our  earlier  works  [12],  [13]. 
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